Multiverse Meta-Analysis¶

This notebook contains all code required for Multiverse Meta-Analyses, including the generation of specifications, bootstrap data, and visualizations.

Imports¶

In [ ]:
import numpy as np
import pandas as pd

from bootstrap import generate_boot_data
from config import read_config
from data import prepare_data
from plotting import (get_cluster_fill_data, get_spec_fill_data,
                      get_colors, plot_treemap, plot_multiverse,
                      plot_caterpillar, plot_sample_size, plot_cluster_size,
                      plot_spec_tiles, plot_cluster_tiles, plot_inferential,
                      plot_p_hist)
from specs import generate_specs
from user_data import preprocess_data

import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
In [ ]:
%load_ext autoreload
%autoreload 2

Dashboard¶

The interactive Dashboard can be launched from this notebook.

In [ ]:
%run -i "./dashboard.py"

Constants¶

In this cell, set the title, the working directory and the path to the dataset for this analysis. The config, preprocessed data, specs, and bootstrap data paths depend on the working directory and the title. This naming convention can be changed, but the prefixes (i.e. boot, config, data and specs) are required for the Dashboard to work. The configuration file must exist, all other data can either be loaded or generated, using the boolean flags. The generated data will be stored at the specified paths, or loaded from that path.

In [ ]:
TITLE = "R2D4D_3"
DIR = "../examples/R2D4D"
DATA_PATH = f"{DIR}/R2D4D.csv"

# TITLE = "Chernobyl_2"
# DIR = "../examples/Chernobyl"
# DATA_PATH = f"{DIR}/Chernobyl.rda"

# TITLE = "IandR_2"
# DIR = "../examples/IandR"
# DATA_PATH = f"{DIR}/iandr.sav"

PREPROCESS_DATA = True # Load of preprocess data
GENERATE_SPECS = True # Load or generate specs
GENERATE_BOOTDATA = True # Load or generate boot data

PP_DATA_PATH = f"{DIR}/data_{TITLE}.csv"
CONFIG_PATH = f"{DIR}/config_{TITLE}.json"
SPECS_PATH = f"{DIR}/specs_{TITLE}.csv"
BOOT_PATH = f"{DIR}/boot_{TITLE}.csv"

Configuration¶

In this cell, the configuration file is processed. The cell prints out the parsed configuration, so the user can double-check if the result is as expected.

In [ ]:
config = read_config(path=CONFIG_PATH)
if config is not None:
    c_info = [
        f"{config['level']} - Level Meta-Analysis",
        f"   Minimum Nr. of Samples to include Specification: {config['k_min']}",
        f"   Bootstrap Iterations: {config['n_boot_iter']}",
        f"   {config['n_which']} Which-Factors:",
        *[f"     {k} : {(', ').join(v)}" for k, v in config['which_lists'].items()],
        f"   {config['n_how']} How-Factors:",
        *[f"     {k} : {(', ').join(v)}" for k, v in config['how_lists'].items()],
        f"   Labels",
        *[f"     {l}" for l in config['labels']],
        f"   Column-Map",
        *[f"     {k} : {v}" for k, v in config['colmap'].items()]
    ]
    print(("\n").join(c_info))
3 - Level Meta-Analysis
   Minimum Nr. of Samples to include Specification: 2
   Bootstrap Iterations: 100
   6 Which-Factors:
     sex : men, women, all_sex
     method : direct, image, all_method
     age_group : adults, non-adults, all_age_group
     sample : healthy, clinical, all_sample
     race : white, other, all_race
     published_estimate : yes, no, all_published_estimate
   3 How-Factors:
     effect : z
     ma_method : REML, ML
     test : t-test, z-test
   Labels
     sex: male
     sex: female
     sex: either
     measure: direct
     measure: image
     measure: either
     age: adults
     age: non-adults
     age: either
     group: healthy
     group: patients
     group: either
     ethnicity: White
     ethnicity: non-White
     ethnicity: either
     report: full
     report: not
     report: either
     metric: z
     model: REML
     model: ML
     test: t
     test: z
   Column-Map
     key_c : Study_name
     key_c_id : c_id
     key_e_id : e_id
     key_z : z
     key_z_se : z_se
     key_z_var : z_var
     key_r : r
     key_r_se : r_se
     key_r_var : r_var
     key_main_es : z
     key_main_es_se : z_se
     key_n : N

Preprocess and Prepare Data¶

In this cell, the dataset is either preprocessed and stored at PP_DATA_PATH, or the preprocessed dataset is loaded from PP_DATA_PATH. The cell prints out the head and the dimensions of the data. If preprocessing is desired, the function preprocess_data() must be defined by the user, in the file user_data.R.

In [ ]:
if PREPROCESS_DATA:
    ma_data = preprocess_data(DATA_PATH, title=TITLE)
else:
    ma_data = pd.read_csv(PP_DATA_PATH)
print(f"Data Shape: {ma_data.shape}")
ma_data.head()
Data Shape: (31, 16)
Out[ ]:
Study_name publ_yr publ_yr_recoded sex age_group sample race method published_estimate N r r_se z z_se z_var r_var
0 Manning (2003) 2003 1 men adults healthy white direct yes 50 0.2900 0.133598 0.298566 0.145865 0.021277 0.017848
1 Latourelle (2008) 2008 6 men adults healthy white image no 35 0.0000 0.176777 0.000000 0.176777 0.031250 0.031250
2 Latourelle (2008) 2008 6 women adults healthy white image no 72 0.0000 0.120386 0.000000 0.120386 0.014493 0.014493
3 Mas (2009) 2009 7 men adults healthy white image no 72 -0.0685 0.119821 -0.068607 0.120386 0.014493 0.014357
4 Mas (2009) 2009 7 men adults clinical white image no 63 0.0021 0.129099 0.002100 0.129099 0.016667 0.016667

In this cell, the preprocessed dataset is prepared for meta-analysis. Preparation adds cluster- and effect- IDs, sets datatypes, etc.. For details, consult the function documentation of prepareData(). The cell prints out the head and the dimensions of the prepared data.

In [ ]:
data = prepare_data(config["colmap"], data=ma_data)
print(f"Data Shape: {data.shape}")
data.head()
Data Shape: (31, 18)
Out[ ]:
c_id Study_name e_id publ_yr publ_yr_recoded sex age_group sample race method published_estimate N r r_se z z_se z_var r_var
0 1 Manning (2003) 1 2003 1 men adults healthy white direct yes 50 0.2900 0.133598 0.298566 0.145865 0.021277 0.017848
1 2 Latourelle (2008) 2 2008 6 men adults healthy white image no 35 0.0000 0.176777 0.000000 0.176777 0.031250 0.031250
2 2 Latourelle (2008) 3 2008 6 women adults healthy white image no 72 0.0000 0.120386 0.000000 0.120386 0.014493 0.014493
3 3 Mas (2009) 4 2009 7 men adults healthy white image no 72 -0.0685 0.119821 -0.068607 0.120386 0.014493 0.014357
4 3 Mas (2009) 5 2009 7 men adults clinical white image no 63 0.0021 0.129099 0.002100 0.129099 0.016667 0.016667

Specifications¶

In this cell, the specifications are either generated and stored at SPECS_PATH, or loaded from SPECS_PATH. For details, consult the function documentation of generate_specs().

In [ ]:
if GENERATE_SPECS:
    specs = generate_specs(
        data,
        config["which_lists"],
        config["how_lists"],
        config["colmap"],
        config["k_min"],
        config["level"],
        SPECS_PATH
    )
else:
    specs = pd.read_csv(SPECS_PATH)
print(specs.shape)
specs.head()
100%|██████████| 2916/2916 [00:39<00:00, 74.77it/s] 
(340, 20)

Out[ ]:
sex method age_group sample race published_estimate effect ma_method test mean lb ub p k set set_es kc full_set rank ci
21 men image adults healthy white no z REML z-test -0.046836 -0.237284 0.147079 0.637611 2 2,3 2,4 2 0 1 0.384363
20 men image adults healthy white no z REML t-test -0.046836 -0.864575 0.838899 0.719751 2 2,3 2,4 2 0 2 1.703474
22 men image adults healthy white no z ML t-test -0.046836 -0.864575 0.838899 0.719751 2 2,3 2,4 2 0 3 1.703474
23 men image adults healthy white no z ML z-test -0.046836 -0.237284 0.147079 0.637611 2 2,3 2,4 2 0 4 0.384363
43 men image adults all_sample white no z ML z-test -0.028613 -0.181069 0.125186 0.716490 3 2,3 2,4,5 2 0 5 0.306255

Bootstrap Data¶

In this cell, the bootstrap data is either generated and stored at BOOT_PATH, or loaded from BOOT_PATH. For details, consult the function documentation of generate_boot_data().

In [ ]:
if GENERATE_BOOTDATA:
    boot_data = generate_boot_data(
        specs,
        config["n_boot_iter"],
        data,
        config["colmap"],
        config["level"],
        BOOT_PATH
    )
else:
    boot_data = pd.read_csv(BOOT_PATH)
print(boot_data.shape)
boot_data.head()
100%|██████████| 100/100 [05:35<00:00,  3.36s/it]
(340, 4)

Out[ ]:
rank obs boot_lb boot_ub
21 1 -0.046836 -0.201379 -0.028634
20 2 -0.046836 -0.201379 -0.028634
22 3 -0.046836 -0.200635 -0.028457
23 4 -0.046836 -0.200635 -0.028457
43 5 -0.028613 -0.168904 -0.024644

Plotting¶

In this cell, the cluster- and specification- fill data for the respective tile maps is prepared, as well as the list of colors that constitute the color scheme. For details, consult the respective function documentation.

In [ ]:
cluster_fill_data = get_cluster_fill_data(
    data,
    specs,
    config["colmap"]
)
spec_fill_data = get_spec_fill_data(
    config["n_which"],
    config["which_lists"],
    config["n_how"],
    config["how_lists"],
    specs
)
fill_levels = len(np.unique([v for v in spec_fill_data.values()]))
colors = get_colors(fill_levels)

Here we define important variables for plotting that will be reused in several plots, to improve readability.

In [ ]:
colmap = config["colmap"]
k_range = [config["k_min"], max(specs["k"])]
labels = config["labels"]
level = config["level"]
n_total_specs = len(specs)
title = config["title"]

Treemap¶

Treemap of the meta-analytic dataset. It visualizes each study and the reported effect size, with the colors indicating the size of the study sample size N (hot colors for low, cold colors for high sample sizes). If studies report multiple effect sizes, the size of each study's tile corresponds to the amount of reported effect sizes. The tile's color indicates the average sample size of the reported effects.

In [ ]:
treemap = plot_treemap(data, title, colmap)
treemap.show()

Inferential Specification Plot¶

In [ ]:
fig_inferential = plot_inferential(boot_data, title, n_total_specs)
fig_inferential.show()

p-Value Histogram¶

In [ ]:
fig_p_hist = plot_p_hist(specs, title, n_total_specs)
fig_p_hist.show()

Multiverse¶

In [ ]:
fig = plot_multiverse(
    specs,
    n_total_specs,
    k_range,
    cluster_fill_data,
    spec_fill_data,
    labels,
    colors,
    config["level"],
    title,
    fill_levels
)
fig.show()

# fig.write_image("multiverse.pdf")
# fig.write_image("multiverse.pdf", width=1000, height=1500)

Individual Multiverse Components¶

In [ ]:
fig_cluster_tiles = plot_cluster_tiles(specs, cluster_fill_data, n_total_specs, title)
fig_cluster_tiles.show()
In [ ]:
fig_caterpillar = plot_caterpillar(specs, n_total_specs, colors, k_range, title, fill_levels)
fig_caterpillar.show()
In [ ]:
fig_cluster_size = plot_cluster_size(specs, k_range, n_total_specs, title)
fig_cluster_size.show()
In [ ]:
fig_sample_size = plot_sample_size(specs, k_range, n_total_specs, title)
fig_sample_size.show()
In [ ]:
fig_spec_tiles = plot_spec_tiles(specs, n_total_specs, spec_fill_data, labels, colors, k_range, title, fill_levels)
fig_spec_tiles.show()